20 R Machine Learning and Data Science packages
The top 20 popular Machine Learning R packages most downloaded from Jan-May 2015.
Note the top most six is more than half of all downloads
The CRAN Package repository features 6778 active packages.
Most of these R packages are favorites of Kagglers, endorsed by many authors, rated based on one package's dependency on other packages.
They are also rated & reviewed by users as a crowdsourced solution by Crantastic.org.
However, these user ratings are too few to be based on for analysis.
Let us explore how many machine learning packages are being downloaded from Jan to May by analysing CRAN daily downloads.
- e1071
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier etc (142479 downloads)
- rpart
Recursive Partitioning and Regression Trees.
(135390)
- igraph
A collection of network analysis tools.
(122930)
- nnet
Feed-forward Neural Networks and Multinomial Log-Linear Models.
(108298)
- randomForest
Breiman and Cutler's random forests for classification and regression.
(105375)
- caret
package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models.
(87151)
- kernlab
Kernel-based Machine Learning Lab.
(62064)
- glmnet
Lasso and elastic-net regularized generalized linear models.
(56948)
- ROCR
Visualizing the performance of scoring classifiers.
(51323)
- gbm
Generalized Boosted Regression Models.
(44760)
- party
A Laboratory for Recursive Partitioning.
(43290)
- arules
Mining Association Rules and Frequent Itemsets.
(39654)
- tree
Classification and regression trees.
(27882)
- klaR
Classification and visualization.
(27828)
- RWeka
R/Weka interface.
(26973)
- ipred
Improved Predictors.
(22358)
- lars
Least Angle Regression, Lasso and Forward Stagewise.
(19691)
- earth
Multivariate Adaptive Regression Spline Models.
(15901)
- CORElearn
Classification, regression, feature evaluation and ordinal evaluation.
(13856)
- mboost
Model-Based Boosting.
(13078)
It is interesting to note that some open source R tools are gaining popularity such as Rattle, a GUI for data mining using R (35539 downloads), and fastcluster, fast hierarchical clustering routines for R and Python (14214 downloads).
Did we miss your favorites? Light up this space and contribute to the community by letting us know which R packages you use!!
For completeness, here is data on 135 R package downloads, from Jan to May 2015.
Bio: Bhavya Geethika is pursuing a masters in Management Information Systems at University of Illinois at Chicago.
Her areas of interests include Statistics & Data Mining for Business, Machine learning and Data-Driven Marketing.